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THERE  is  an  explosion  of  interest  in  processing  and  analyzing  large  datasets  collected  in  very  different 
settings,  including  social  and  economic  networks,  information  networks,  internet  and  the  world  wide  web, 
immunization  and  epidemiology  networks,  molecular  and  gene  regulatory  networks,  citation  and 
coauthorship  studies,  friendship  networks,  as  well  as  physical  infrastructure  networks  like  sensor 
networks,  power  grids,  transportation  networks,  and  other  networked  critical  infrastructures.  We  briefly 
contrast  our  approach  in  this  project  with  existing  work. 

A.  Brief  review  of  the  literature 

Many  authors  focus  on  the  underlying  relational  structure  of  the  data  by:  1)  inferring  the  structure  from 
community  relations  and  friendships,  or  from  perceived  alliances  between  agents  as  abstracted  through 
game  theoretic  models  [1],  [2];  2)  quantifying  the  connectedness  of  the  world;  and  3)  determining  the 
relevance  of  particular  agents,  or  studying  the  strength  of  their  interactions.  Other  authors  are  interested 
in  the  network  function  by  quantifying  the  impact  of  the  network  structure  on  the  diffusion  of  disease, 
spread  of  news  and  information,  voting  trends,  imitation  and  social  influence,  crowd  behavior,  failure 
propagation,  global  behaviors  developing  from  seemingly  random  local  interactions  [2],  [3],  [4].  Much  of 
these  works  either  develop  or  assume  network  models  that  capture  the  interdependencies  among  the 
data  and  then  analyze  the  structural  properties  of  these  networks.  Models  often  considered  may  be 
deterministic  like  complete  or  regular  graphs,  or  random  like  the  Erdos-Renyi  and  Poisson  graphs,  the 
configuration  and  expected  degree  models,  small  world  or  scale  free  networks  [2],  [4],  to  mention  a  few. 
These  models  are  used  to  quantify  network  characteristics,  such  as  connectedness,  existence  and  size  of 
the  giant  component,  distribution  of  component  sizes,  degree  and  clique  distributions,  and  node  or  edge 
specific  parameters  including  clustering  coefficients,  path  length,  diameter,  betweenness  and  closeness 
centralities. 

Another  body  of  literature  is  concerned  with  inference  and  learning  from  such  large  datasets.  Much  work 
falls  under  the  generic  label  of  graphical  models  [5],  [6],  [7],  [8],  [9],  [10].  In  graphical  models,  data  is 
viewed  as  a  family  of  random  variables  indexed  by  the  nodes  of  a  graph,  where  the  graph  captures 


probabilistic  dependencies  among  data  elements.  The  random  variables  are  described  by  a  family  of  joint 
probability  distributions.  For  example,  directed  (acyclic)  graphs  [11],  [12]  represent  Bayesian  networks 
where  each  random  variable  is  independent  of  others  given  the  variables  defined  on  its  parent 
nodes.  Undirected  graphical  models,  also  referred  to  as  Markov  random  fields  [13],  [14],  describe  data 
where  the  variables  defined  on  two  sets  of  nodes  separated  by  a  boundary  set  of  nodes  are  statistically 
independent  given  the  variables  on  the  boundary  set.  A  key  tool  in  graphical  models  is  the  Hammersley- 
Clifford  theorem  [13],  [15],  [16],  and  the  Markov-Gibbs  equivalence  that,  under  appropriate  positivity 
conditions,  factors  the  joint  distribution  of  the  graphical  model  as  a  product  of  potentials  defined  on  the 
cliques  of  the  graph.  Graphical  models  exploit  this  factorization  and  the  structure  of  the  indexing  graph  to 
develop  efficient  algorithms  for  inference  by  controlling  their  computational  cost.  Inference  in  graphical 
models  is  generally  defined  as  finding  from  the  joint  distributions  lower  order  marginal  distributions, 
likelihoods,  modes,  and  other  moments  of  individual  variables  or  their  subsets.  Common  inference 
algorithms  include  belief  propagation  and  its  generalizations,  as  well  as  other  message  passing 
algorithms.  A  recent  block-graph  algorithm  for  fast  approximate  inference,  in  which  the  nodes  are  non¬ 
overlapping  clusters  of  nodes  from  the  original  graph,  is  in  [17].  Graphical  models  are  employed  in  many 
areas;  for  sample  applications,  see  [18]  and  references  therein. 

Extensive  work  is  dedicated  to  discovering  efficient  data  representations  for  large  high-dimensional  data 
[19],  [20],  [21],  [22].  Many  of  these  works  use  spectral  graph  theory  and  the  graph  Laplacian  [23]  to  derive 
low-dimensional  representations  by  projecting  the  data  on  a  low-dimensional  subspace  generated  by  a 
small  subset  of  the  Laplacian  eigenbasis.  The  graph  Laplacian  approximates  the  Laplace-Beltrami  operator 
on  a  compact  manifold  [21],  [24],  in  the  sense  that  if  the  dataset  is  large  and  samples  uniformly  randomly 
a  low-dimensional  manifold  then  the  (empirical)  graph  Laplacian  acting  on  a  smooth  function  on  this 
manifold  is  a  good  discrete  approximation  that  converges  pointwise  and  uniformly  to  the  elliptic  Laplace- 
Beltrami  operator  applied  to  this  function  as  the  number  of  points  goes  to  infinity  [25],  [26],  [27].  One  can 
go  beyond  the  choice  of  the  graph  Laplacian  by  choosing  discrete  approximations  to  other  continuous 
operators  and  obtaining  possibly  more  desirable  spectral  bases  for  the  characterization  of  the  geometry 
of  the  manifold  underlying  the  data.  For  example,  if  the  data  represents  a  non-uniform  sampling  of  a 
continuous  manifold,  a  conjugate  to  an  elliptic  Schrddinger-type  operator  can  be  used  [28],  [29],  [30]. 

More  in  line  with  the  research  we  developed  in  this  project,  several  works  have  proposed  multiple 
transforms  for  data  indexed  by  graphs.  Examples  include  regression  algorithms  [31],  wavelet 
decompositions  [30],  [32],  [33],  [34],  [35],  filter  banks  on  graphs  [36],  [37],  de-noising  [38],  and 
compression  [39].  Some  of  these  transforms  focus  on  distributed  processing  of  data  from  sensor  fields 
while  addressing  sampling  irregularities  due  to  random  sensor  placement.  Others  consider  localized 
processing  of  signals  on  graphs  in  multiresolution  fashion  by  representing  data  using  wavelet-like  bases 
with  varying  "smoothness"  or  defining  transforms  based  on  node  neighborhoods.  In  the  latter  case,  the 
graph  Laplacian  and  its  eigenbasis  are  sometimes  used  to  define  a  spectrum  and  a  Fourier  transform  of  a 


signal  on  a  graph.  This  definition  of  a  Fourier  transform  was  also  proposed  for  use  in  uncertainty  analysis 
on  graphs  [40],  [41],  This  graph  Fourier  transform  is  derived  from  the  graph  Laplacian  and  restricted  to 
undirected  graphs  with  real,  non-negative  edge  weights,  not  extending  to  data  indexed  by  directed  graphs 
or  graphs  with  negative  or  complex  weights. 

The  algebraic  signal  processing  (ASP)  theory  [42],  [43],  [44],  [45]  is  a  formal,  algebraic  approach  to  analyze 
data  indexed  by  special  types  of  line  graphs  and  lattices.  The  theory  uses  an  algebraic  representation  of 
signals  and  filters  as  polynomials  to  derive  fundamental  signal  processing  concepts.  This  framework  has 
been  used  for  discovery  of  fast  computational  algorithms  for  discrete  signal  transforms  [42],  [46],  [47].  It 
was  extended  to  multidimensional  signals  and  nearest  neighbor  graphs  [48],  [49]  and  applied  in  signal 
compression  [50],  [51].  The  framework  proposed  that  we  developed  in  this  project  generalizes  and 
extends  the  ASP  to  signals  on  arbitrary  graphs. 

B.  Overview  of  our  contributions 

Our  goal  was  to  develop  a  linear  discrete  signal  processing  (DSP)  framework  and  corresponding  tools  for 
datasets  arising  from  social,  biological,  and  physical  networks.  DSP  has  been  very  successful  in  processing 
time  signals  (such  as  speech,  communications,  radar,  or  econometric  time  series),  space-dependent 
signals  (images  and  other  multidimensional  signals  like  seismic  and  hyperspectral  data),  and  time-space 
signals  (video).We  refer  to  data  indexed  by  nodes  of  a  graph  as  a  graph  signal  or  simply  signal  and  to  our 
approach  as  DSP  on  graphs  DSPg-  We  developed  the  basics  of  linear  DSPg,  including  the  notion  of  a  shift 
on  a  graph,  graph  filter  structure,  graph  filtering  and  graph  convolution,  graph  signal  and  graph  filter 
spaces  and  their  algebraic  structure,  the  graph  Fourier  transform,  graph  frequency,  graph  spectrum,  graph 
spectral  decomposition,  and  graph  impulse  and  graph  frequency  responses.  With  respect  to  other  works, 
ours  is  a  deterministic  framework  to  signal  processing  on  graphs  rather  than  a  statistical  approach  like 
graphical  models.  Our  work  is  an  extension  and  generalization  of  the  traditional  DSP,  and  generalizes  the 
Algebraic  Signal  Processing  theory  [42],  [43],  [44],  [45]  and  its  extensions  and  applications  [49],  [50], 
[51].  We  emphasize  here  the  contrast  between  the  DSPg  and  the  approach  to  the  graph  Fourier  transform 
that  takes  the  graph  Laplacian  as  a  point  of  departure  [32],  [35],  [36],  [38],  [39],  [41].  In  the  latter  case, 
the  Fourier  transform  on  graphs  is  given  by  the  eigenbasis  of  the  graph  Laplacian.  Flowever,  this  definition 
is  not  applicable  to  directed  graphs,  which  often  arise  in  real-world  problems,  and  graphs  with  negative 
weights.  In  general,  the  graph  Laplacian  is  a  second-order  operator  for  signals  on  a  graph,  whereas  an 
adjacency  matrix  is  a  first-order  operator.  Deriving  a  graph  Fourier  transform  from  the  graph  Laplacian  is 
analogous  in  traditional  DSP  to  restricting  signals  to  be  even  (like  correlation  sequences)  and  Fourier 
transforms  to  represent  power  spectral  densities  of  signals.  Instead,  we  demonstrated  that  the  graph 
Fourier  transform  is  properly  defined  through  the  Jordan  normal  form  and  generalized  eigenbasis  of  the 
adjacency  matrix.  Finally,  we  illustrate  the  DSPg  with  applications  like  classification,  compression,  and 
linear  prediction  for  datasets  that  include  blogs,  customers  of  a  mobile  operator,  or  collected  by  a  network 
of  irregularly  placed  weather  stations,  under  many  other  applications. 


Summary  of  results 

DSPg  extended  the  algebraic  signal  processing  (ASP)  theory  introduced  in  [42], [43], [44],  [45], [46]  where 
the  shift  is  the  elementary  non-trivial  filter  that  generates,  under  an  appropriate  notion  of  shift  invariance, 
all  linear  shift-invariant  filters  for  a  given  class  of  signals.  Our  key  insight  in  DSPg  to  build  the  theory  of 
signal  processing  on  graphs  is  to  identify  the  shift  operator.  We  adopted  the  weighted  adjacency  matrix 
of  the  graph  as  the  shift  operator  and  then  developed  appropriate  concepts  of  z-transform,  impulse  and 
frequency  response,  filtering,  convolution,  and  Fourier  transform.  In  particular,  the  graph  Fourier 
transform  in  this  framework  expands  a  graph  signal  into  a  basis  of  eigenvectors  of  the  adjacency  matrix, 
and  the  corresponding  spectrum  is  given  by  the  eigenvalues  of  the  adjacency  matrix. 

The  association  of  the  graph  shift  with  the  adjacency  matrix  in  DSPg  is  natural  and  has  multiple  intuitive 
interpretations.  The  graph  shift  is  an  elementary  filter,  and  its  output  is  a  graph  signal  with  the  value  at 
vertex  n  of  the  graph  given  approximately  by  a  weighted  linear  combination  of  the  input  signal  values  at 
graph  neighbors  of  n.  With  appropriate  edge  weights,  the  graph  shift  can  be  interpreted  as  a  (minimum 
mean  square)  first-order  linear  predictor.  Another  interpretation  in  DSPg  of  the  graph  shift  comes  from 
Markov  chain  theory  [W],  where  the  adjacency  matrix  represents  the  one-step  transition  probability 
matrix  of  the  chain  governing  its  dynamics.  Finally,  the  graph  shift  can  also  be  seen  as  a  stencil 
approximation  of  the  first-order  derivative  on  the  graph. 

Because  the  eigenvalues  of  the  graph  shift  are  in  general  complex  valued,  there  is  an  issue  that  arises  in 
DSPg  with  defining  low  and  high  frequencies,  and  low-pass,  band-pass,  and  high-pass  graph  signals  or 
graph  filters.  In  DSPg,  we  defined  low  and  high  frequencies  and  low-,  high-,  and  band-pass  graph  signals 
and  filters  on  generic  graphs  in  a  novel  way.  In  traditional  discrete  signal  processing  (DSP),  these  concepts 
have  an  intuitive  interpretation,  since  the  frequency  contents  of  time  series  and  digital  images  are 
described  by  complex  or  real  sinusoids  that  oscillate  at  different  rates  [33].  The  oscillation  rates  provide  a 
physical  notion  of  "low"  and  "high"  frequencies:  low-frequency  components  oscillate  less  and  high- 
frequency  ones  oscillate  more.  Flowever,  these  concepts  do  not  have  a  similar  interpretation  on  graphs, 
and  it  was  not  obvious  how  to  order  graph  frequencies  to  describe  the  low-  and  high-frequency  contents 
of  a  graph  signal. 

In  DSPg,  we  developed  an  ordering  of  the  graph  frequencies  that  is  based  on  how  "oscillatory"  the  graph 
spectral  components  are  with  respect  to  the  indexing  graph,  i.e.,  how  much  they  change  from  a  node  to 
neighboring  nodes.  To  quantify  this  amount,  we  introduced  the  graph  total  variation  function  that 
measures  how  much  signal  samples  (values  of  a  graph  signal  at  a  node)  vary  in  comparison  to  neighboring 
samples.  This  approach  is  analogous  to  the  approach  taken  in  classical  DSP  theory,  where  the  oscillations 
in  time  and  image  signals  are  also  quantified  by  appropriately  defined  total  variations  [33].  Once  we  have 
an  ordering  of  the  frequencies  based  on  the  graph  total  variation  function,  we  define  the  notions  of  low 
and  high  frequencies,  as  well  as  low-,  high-,  and  band-pass  graph  signals  and  graph  filters. 


We  applied  DSPg  in  a  number  of  important  applications,  demonstrating  not  only  its  wide  applicability  as 
well  as  the  gains  of  performance  it  affords  when  analyzing  signals  indexed  by  nodes  of  a  graph. 
Applications  we  considered  included:  signal  recovery  on  graphs,  classification,  compression,  semi- 
supervised  learning,  detection  of  anomalies,  and  sensor  network  analysis.  For  example,  signal  recovery 
on  graphs  recovers  one  or  multiple  smooth  graph  signals  from  noisy,  corrupted,  or  incomplete 
measurements.  We  formulated  graph  signal  recovery  as  an  optimization  problem,  for  which  we  provided 
a  general  solution  through  the  alternating  direction  methods  of  multipliers.  We  showed  how  signal 
inpainting,  matrix  completion,  robust  principal  component  analysis,  and  anomaly  detection  all  relate  to 
graph  signal  recovery  and  provided  corresponding  specific  solutions  and  theoretical  analysis.  We  validated 
the  proposed  methods  on  real-world  recovery  problems,  including  online  blog  classification,  bridge 
condition  identification,  temperature  estimation,  recommender  system  for  jokes,  and  expert  opinion 
combination  of  online  blog  classification.  On  another  set  of  studies,  we  showed  that  naturally  occurring 
graph  signals,  such  as  measurements  of  physical  quantities  collected  by  sensor  networks  or  labels  of 
objects  in  a  dataset,  tend  to  be  low-frequency  graph  signals,  while  anomalies  in  sensor  measurements  or 
missing  data  labels  can  amplify  high-frequency  parts  of  the  signals.  We  demonstrated  how  these 
anomalies  can  be  detected  using  appropriately  designed  high-pass  graph  filters,  and  how  unknown  parts 
of  graph  signals  can  be  recovered  with  appropriately  designed  regularization  techniques.  In  particular,  our 
experiments  showed  that  classifiers  designed  using  the  graph  shift  matrix  lead  to  higher  classification 
accuracy  than  classifiers  based  on  the  graph  Laplacian  matrices,  combinatorial  or  normalized. 


The  specific  framework,  theoretical  results,  analysis  methods,  and  application  studies  carried  out  were 
detailed  in  papers  [1]  through  [12],  see  Section  C. 
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